GenomeCompress: A Novel Algorithm for DNA Compression

نویسندگان

  • Umesh Ghoshdastider
  • Banani Saha
چکیده

The genome of an organism contains all hereditary information encoded in DNA. So it is extremely important to sequence the genome which determines how the organisms survive, develop and multiply. Since three decades, due to massive efforts on DNA sequencing, complete genome sequence of a large number of organisms including humans are now known and the genomic databases are growing exponentially with time. Also for the huge size of the genomes, an efficient algorithm is required to compress them. General text compression algorithms don’t utilize the specific characteristics of a DNA sequence. DNA specific compression algorithms exploit the repetitiveness of bases in DNA sequences. A repetitive DNA sequence can be best compressed using dictionary based compression algorithm. Non-repetitive parts of the DNA are generally compressed using dynamic programming, by dividing the sequences in square matrices which contain common repeat of a single base and then substituting the matrix with the base and putting the order of the matrix in a string. In this paper, a novel algorithm for DNA compression is proposed in order to compress both repetitive and non repetitive DNA sequence. The algorithm is also compared with existing ones and is found to achieve better compression ratio than the others.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Color Image Compression Method Using Eigenimages

Since the birth of multi–spectral imaging techniques, there has been a tendency to consider and process this new type of data as a set of parallel gray–scale images, instead of an ensemble of an n–D realization. Although, even now, some researchers make the same assumption, it is proved that using vector geometries leads to better results. In this paper, first a method is prop...

متن کامل

DNABIT Compress – Genome compression algorithm

Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits...

متن کامل

Parallelizing Assignment Problem with DNA Strands

Background:Many problems of combinatorial optimization, which are solvable only in exponential time, are known to be Non-Deterministic Polynomial hard (NP-hard). With the advent of parallel machines, new opportunities have been emerged to develop the effective solutions for NP-hard problems. However, solving these problems in polynomial time needs massive parallel machines and ...

متن کامل

Implementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey

Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...

متن کامل

Determining the Proper compression Algorithm for Biomedical Signals and Design of an Optimum Graphic System to Display Them (TECHNICAL NOTES)

In this paper the need for employing a data reduction algorithm in using digital graphic systems to display biomedical signals is firstly addressed and then, some such algorithms are compared from different points of view (such as complexity, real time feasibility, etc.). Subsequently, it is concluded that Turning Point algorithm can be a suitable one for real time implementation on a microproc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007